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Abstract. There are multiple proposed interpretations of probability theory: one such interpre¬ 
tation is true-false logic under uncertainty. Cox’s Theorem is a representation theorem that 
states, under a certain set of axioms describing the meaning of uncertainty, that every true- 
false logic under uncertainty is isomorphic to conditional probability theory. This result was 
used by Jaynes to develop a philosophical framework in which statistical inference under un¬ 
certainty should be conducted through the use of probability, via Bayes’ Rule. Unfortunately, 
most existing correct proofs of Cox’s Theorem require restrictive assumptions: for instance, 
many do not apply even to the simple example of rolling a pair of fair dice. We offer a new 
axiomatization by replacing various technical conditions with an axiom stating that our theory 
must be consistent with respect to repeated events. We discuss the implications of our results, 
both for the philosophy of probability and for the philosophy of statistics. 

Keywords: conditional probability, foundations of probability, foundations of statistics, true- 
false logic under uncertainty. 


1 Introduction 

In statistical practice, and related fields of research that involve studying data such as machine learn¬ 
ing, understanding uncertainty from an axiomatic and quantitative perspective is of fundamental 
importance. The mathematics for doing this are well-established through decades of practice: a 
standard approach is to begin by specifying a probabilistic model for the data, and using one of a 
variety of methods to infer the parameters of the model from the data. Typical approaches for point 
estimation include regularized maximum likelihood from a frequentist perspective, and maximum 
a posteriori from a Bayesian perspective. These two approaches are of course mathematically iden¬ 
tical: we may choose to interpret them through any philosophical lens we wish, be it frequentist, 
Bayesian, or some other isomorphic perspective. 

More generally, it is natural to ponder the relationship between mathematics and philosophy in 
the statistical inference problem, and to ask questions about the philosophical applicability of the 
Bayesian statistical paradigm to real-world problems. Under what assumptions should reasoning 
under uncertainty be performed through the use of conditional probability - through the use of 
Bayes’ Rule? If we choose to be philosophically Bayesian when interpreting a statistical model, 
can we can reasonably connect the assumptions required to do so to the real-world problem being 
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studied? Does the use of probability theory itself in the context of a Bayesian model contribute to 
its uncertainty, within the space of other possible theories of reasoning we might consider? 


These and other related questions have a long history of study from many perspectives. We focus 
here on the line of reasoning based on the works of Cox [l^, who derived conditions under which 
abstract reasoning under uncertainty is isomorphic to hnitely additive probability theory, and Jaynes 


29l |. who constructed the philosophical framework under which Cox’s Theorem may be applied to 
the inference problem to obtain the Bayesian statistical paradigm. The ideas originated in the 
physics community going back to Schrodinger [i^, ^t have since been studied by philosophers 
|l2l |. pure mathematicians 


26| . computer scientists [25 


context, Cheeseman [ly] has called Cox’s Theorem the 
(Bayesian) probability theory”. 


and others. In the artihcial intelligence 
strongest argument for the use of standard 


Unfortunately, all known proofs to date are either incorrect due to subtle issues involving domains, 
or contain assumptions that limit their generality - the vast majority do not even apply to the case 


of tossing a pair of fair six-sided dice. Indeed, Paris |38(] has said that “when an attempt is made to 
hll in all the details [of Cox’s proof] some of the attractiveness of the original is lost” - rigorizing 
Cox’s work in a philosophically satisfying way is our primary aim. Further, no previous proof of 
Cox’s Theorem yields countable additivity - we obtain it using ideas previously proposed for the 
de Finetti system that also turn out to signihcantly simplify regularity conditions used by other 
authors. Finally, Cox’s Theorem, in spite of its philosophical signihcance, is not widely known. In 
this work, we contribute the following. 


1. We present Cox’s Theorem and its proof, and the Jaynesian interpretation of probability, in 
a manner that is readily accessible to the probability and statistics communities. 

2. Our proof’s assumptions are more general and, in our framework, more natural from a philo¬ 
sophical perspective, compared to previous correct attempts. 

3. Our proof yields a countably additive probability theory. 


In our approach, we consider both philosophy and mathematics. From a mathematical perspective, 
all frameworks we review and consider will, under appropriate assumptions, yield the standard 
probability theory of Kolmogorov and all theorems that follow from it - one may derive results 
using any theory, switching from one perspective to another as convenient. On the other hand, the 
philosophical frameworks used to justify axioms are all different, and all of inherent interest on their 
own. 


Throughout, we consider the relationship between philosophies of probability, and of statistical 
inference. The frequentist interpretation of probability has been used to justify Neyman-Pearson 
Hypothesis Testing and Neyman’s Theory of Conhdence Intervals as a way to quantify uncertainty 
for decades. We review and describe major frameworks of probability that are used in motivating 


approaches to inference. The framework of Jaynes [2^ is our primary focus of study. 


In this work, our approach is descriptive rather than evaluative. We seek to present various philo¬ 
sophical theories of probability and the mathematical assumptions that are justihed on their basis. 
We do not consider whether one philosophical framework is superior to another: in particular we 
are explicitly not interested in debating the merits of Bayesian formulations compared to frequentist 
ones, nor of the de Finetti approach compared to Jaynes’ approach. 


An outline of the rest of the paper is as follows. In Section [2] we describe the history and prior work 
involved in different axiomatizations of probability and the development of the proof of Cox’s Theo- 
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rem, highlighting issues of domain that have led to mathematical errors in many previous attempts. 
In Section [3] we state our definitions and axioms used to construct probability theory, present ing 
theorems and proofs showing that our construction yields the standard theory of Kolmogorov [3^ 
in Section 01 Section 0] concludes with a discussion. 


2 History and Previous Work 


Historically, the study of probability began 0 in 1654 with an exchange of letters [I^ between 
Pascal and Fermat, who developed a notion of probability based on equipossible outcomes. Condi¬ 
tional probability was dehned by Bayes and Laplace [3^. The frequentist interpretation based 
on geometric notions of area and limiting notions of repeated events began with the work of Cournot 
13 and Venn 1531. We now review and describe the two modern axiomatizations that are most 


familiar to the probability and statistics communities, and present the framework of our analysis. 


2.1. Kolmogorov. Mathematically rigorous formulations of probability began in the early-to-mid 
20th century. The first such formulation was the work of Kolmogorov [3^, who, in modern notation, 
assumed the following. 

1. (Definition: Probability Triple) Let H be a set, ^ be a a-algebra on H, and P be a set function. 

2. (Axiom: Normalization) P{VL) = 1. 

3. (Axiom: Non-negativity) P(H) > 0 for all H G 

4. (Axiom: Countable Additivity): P(U^i^i) = IP( A) for all countable collections of dis¬ 
joint sets Ai E . 

Kolmogorov justified his axioms philosophically through the introduction of a repeated-sampling 
analogy where m/n represents the frequency of some event, and used this to argue, for instance, 
that 0 < m/n < 1 justifies the normalization and non-negativity axioms0 This interpretation has 
been studied further by many authors, such as Von Mises (s^ ]. 

The frequentist perspective forms the philosophical groundwork on which the classical Neyman- 
Pearson Theory is based. Its notion of uncertainty involves considering alternative data sets that 
could have been observed. For example, a p-value is dehned to be the repeated-sampling probability 
of seeing data similar to or more extreme than what was observed. 


2.2. De Finetti. The second rigorous system of the 20th century was developed by de Finetti [18 


He began his philosophical justihcation by introducing a setting in which a person wishing to reason 
sensibly, who, following Good we refer to as Fort, is wagering money against an opponent O, 
regarding some true/false proposition A. The task is to set the price P such that if A turns out to 
be true. You will pay O 1 monetary unit. O is permitted to either: 


1. buy Your promise for P, or 

2. force You to buy the same promise from O at the same price. 

^We refer here to Kolmogorov’s 1936 monograph, Section 2 “Relationship with events of an experiment”, subsec¬ 
tion “Empirical deduction of axioms”, translated to English by the first author of this work. 
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Then de Finetti introduced a set of axioms ensuring that there exists a price such that there are no 
bets in which O wins no matter what. He showed that from these axioms, P must be conditional 
probability, with all of Kolmogorov’s axioms following as theorems, except for countable additivity. 


In the past century, many authors such as Fishburn [21J and Lad [3^ have studied de Finetti’s 


system further, making the domain of the theory explicit and rigorous, and developing it both as 


a hnitely additive theory of probability, and as a countabh 
monotone continuity postulate, as in Bernardo and Smith [a 


additive theory with the addition of a 
. The system has been used extensively 


in the development by various authors of the subjectivist interpretation of probability of Ramsey 
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The de Finetti axiomatization has been used as the philosophical groundwork for Bayesian ap¬ 
proaches to statistical inference. Many have used it to argue that the fundamental notion of uncer¬ 
tainty in statistical inference should be taken to be probability theory. Examples include the work 
of Savage ji^ and Bernardo and Smith [^. 


2.3. Cox and Jaynes. The hnal system we describe in some detail, and the one we focus on here, 
began, following Schrodinger [i^, with the work of Cox [l^, which we present in the framework 
of Jaynes [^. Jaynes began his approach to inference by introducing the metaphor of a thinking 
robot that reasons sensibly about logical true/false propositions. If all propositions are known to be 
exactly true or false, then Jaynes assumed that they are reasoned about using true/false logic, i.e., 
inference should be performed according to a Boolean algebra. Wishing to extend this to logical 
reasoning under uncertainty, he then wrote down a set of desiderata dehning what it means to 
reason sensibly. 

I. States of uncertainty are represented by real numbers. 

II. Qualitative correspondence with common sense. 

(a) If the truth value of a proposition increases, its probability must also increase. 

(b) In the limit, small changes in propositions must yield small changes in probabilities. 

III. Consistency with true-false logic. 

(a) Probabilities that depend on multiple propositions cannot depend on the order in which 
they are presented. 

(b) All known propositions must be used in reasoning - nothing can be arbitrarily ignored. 

(c) If, in two settings, the propositions known to be true are identical, the probabilities must 
be as well. 

From here, Jaynes deduced a set of axioms formalizing the above, and following Cox [l^, showed 
that any system of reasoning used must be, up to a transformation, conditional probability theory 
- thus, statistical inference should be performed using probability theory. Jaynes’ version of Cox’s 
Theorem yields all of Kolmogorov’s axioms as theorems, except for countable additivity, just as in 
the de Finetti system. 


Unfortunately, Jaynes’ writing contains mathematical errors - his proof as written is incorrect. 
Jaynes was vehement in what he referred to as the Finite-Sets Policy, under which he was only 
interested in mathematics involving hnite sets. On the other hand, Halpern 2^ has shown via 
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explicit counterexample that a lemma used in Jaynes’ proof can only hold if a particular set is 
inhnite. Cox’s work does not explicitly mention this requirement, so it is also not fully rigorous, 
modulo precise interpretation of his words (Snow ^ interprets Cox as assuming a sufficient axiom 
implicitly). 

The underlying issue is that Cox’s proof involves the use of functional equations - however, the 
inputs to one of the equations may not span the full range needed to constrain the theory. Even if 
this were not an issue, the proof would still need additional arguments in the hnite case, because 
solutions of functional equations often depend heavily on the domain they are dehned on - the 
Cauchy equation is a famous example, admitting very different solutions on Q compared to M ^ 


One of the only correct proofs of Cox’s theorem from assumptions founded in Jaynes’ desiderata is 
given by Paris j^, who assumes the following: 

• (Density). For all 0 < a, / 9 ,7 < 1 and e > 0, there exist sets Ui D U2 D U3 D U4 such that (a) 
U 3 ^ 0, (b) I ¥{U^ I U 3 ) -a\<e, (c) | \ U 2 ) - (3\ < e and (d) | P(t/2 | t/i) - 7I < e- 


This is called density because, as stated, it requires that the range of P is dense in an appropriate 
interval, which forces the underlying logic to contain at minimum countably many propositions. 
Thus, Cox’s Theorem as written in Paris [^, and similar variations such as in Van Horn j^, do 
not apply even to the simple case of rolling a pair of fair 6-sided dice. Further, though correct, 
Paris’ proof is slightly less general than other variants - see Section |3 


2.4. Other previous work. A number of other approaches to rigorizing Cox’s Theorem have also been 
proposed. These are briefly summarized below, starting with the assumptions they require. 

1. Lattice Symmetries. Knuth and Skilling have proposed variations applicable to hnite 
domains. 

2. Refinability. Arnborg and Sjodin describe assumptions for inhnite domains that are similar 
to density but more philosophically natural in certain settings. 

3. Strong Rescaling. Dupre and Tipler show that under this assumption (we refer to their 
Axiom 5), a variation on Cox’s Theorem that uses ideas from de Finetti becomes trivial to 
prove. 

4. Linear Order, Dividedness, and Archimedianity. Hardy ohers a proof from these assump¬ 
tions. 


5. Confidence Spaces. Zimmermann and Cremers propose another variation on Cox’s Theo¬ 
rem, and study its relationship with other approaches to uncertainty. 


6. Non-Boolean Logic. Colyvan [11|, [1^ has pointed out that Cox’s Theorem requires the Law 


of the Excluded Middle, and can thus fail to hold when the underlying logic of reasoning is 
not Boolean. 


All of this work is important, useful, and applicable in the respective authors’ settings. 


2.5. Other systems of probability. Other axiomatizations of probability have been constructed beyond 
those of Kolmogorov, de Finetti, and Cox. The logical interpretation has been developed by Johnson 


31l |. Keynes [32[], Jeffreys [30j, and extensively by Carnap [81], who viewed probability as the degree 
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of confirmation that empirical evidence gives to a proposition. Popper [39j created an objective 
propensity theory of probability based on a set of axioms created for that setting. Renyi 
developed an axiomatization and framework for conditional probability. All of these frameworks 
have been fairly recently reviewed by Hajek [^. Shafer and Vovk ji^ have dehned a notion of 
game-theoretic probability using the tools of non-standard analysis, and used it to derive many 
classical limit theorems in this setting. Briggs has argued that all probability is conditional and 
should be considered part of logic. We do not focus on any of these approaches here because they 
are chiefly concerned with the meaning of probability, rather than its implications for the philosophy 
of statistical inference. 


2.6. Current status. No current proof of Cox’s Theorem for constructing the Jaynesian interpretation 
of probability is able to cover both the simple example of rolling a pair of fair 6-sided dice, and 
spaces of functions such as those used in Bayesian Nonparametric inference, where inhnite sets and 
countable additivity are necessary to avoid counterexamples (even if we were to restrict ourselves to 
hnite sets, countable additivity is important, as without it our models may degenerate under mesh 
rehnement j^). We build on the approaches of Paris and Van Horn to construct such a 
variant, by introducing an alternative to density that is more natural from a philosophical point of 
view, and does not require restricting ourselves to inhnite domains. 


3 Definitions and Axioms 


Here we list precisely what dehnitions and axioms we need for Cox’s Theorem - and the Jaynesian 
interpretation of probability - to apply. We defer all consequences of these dehnitions and axioms to 
Section m To avoid repetitive statements, we assume throughout that any sets A, B in the function 
P(A I B) are elements of S', and that any set B on the right-hand side of the conditioning bar is 
non-empty. 


All of our axioms are justihed using the framework of Jaynes [2^ - as this is our focus, we do not 


consider alternative frameworks here. See Sections [2] and O for an overview of other perspectives, in 
which some of our axioms (for instance [2]) may be controversial. 


Axiom 1 (Probability is a Real Number). 

Let H be a set, and let ^ be a a-algebra on H. Let P:^x(^\0)—)-RCMbea function, written 
using the notation P(A \ B). 


Justification. Jaynes (29(], Desiderata I (States of uncertainty are represented by real numbers). A 
cr-algebra is chosen as the domain because by Stone’s Representation Theorem jj^, every Boolean 
algebra is isomorphic to an algebra of sets, which will be a a-algebra if the Boolean algebra is 
(T-complete by the Loomis-Sikorski Representation Theorem - see also js^. This allows us to 
choose our formalism to match that of the Kolmogorov system - indeed, it provides a philosophical 
reason why Kolmogorov’s system should be formulated using a a-algebra rather than something 
else. This also immediately differentiates probability theory from other theories of uncertainty not 
represented by real numbers - see Section O 


Axiom 2 (Sequential Continuity). 
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We have that 


A-i ^ A 2 A 3 C .. such that Ai A implies P(v4j | B) P(y4 | B) (1) 


for all A, Aj, B. 


Justification. Jaynes 29j, Desiderata II (Qualitative correspondence with common sense). This 


axiom provides a natural way to formalize both the necessary ordering between propositions that 
are true and those that are false, and the notion that small changes in truth values must yield small 
differences in probabilities. In particular, it allows us to take limits in the appropriate fashion, and 
assuming it is similar to working with a Hilbert space rather than an inner product space, or on M 
rather than O. 


Axiom 3 (Decomposability). 

¥{AB I C) can be written as 

P(H I C) o F{B I AC) (2) 

for some some function o : [R x R) R. 


Justification. Jaynes [2^, Desiderata III (Consistency with true-false logic). This directly formalizes 
Jaynes’ desiderata that componnd logical propositions should decompose into simpler propositions. 


See also Tribus (Ml for a discussion on why o should be chosen with two argnments rather than 


three or four. Note that we do not assume commutativity, it will instead be proven. 


Axiom 4 (Negation). 

There exists a function N : R ^ R such that 


P(H^ \B)=N [P(H I B)] 


(3) 


for all A, B. 

Justification. Jaynes j^. Desiderata III (Consistency with trne-false logic). Every element of a 
Boolean algebra is mapped uniquely to its negation, thus the probability that a proposition is true 
should uniquely determine the probability that it is false. 

Axiom 5 (Consistency Under Extension). 

If (D,^,P) satishes the axioms above, then x (g)^,PoP^ must as well, i.e., the dehnition 
P(H X B \ C X D) = P(H I C) o F{B \ D) is consistent. 

Justification. Jaynes j^. Desiderata III (Consistency with true-false logic). We argue that since 
Boolean algebras can always be dehned in a way that allows one to consider what might happen 
if the same event were to be repeated, our theory of uncertainty should therefore allow this as as 

well. 


4 Theorems and Proofs 

We begin with a dehnition to encompass the algebraic structure we wish to analyze. 
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Definition 1 (True-false Logic under Uncertainty). 

Let (fi, P, o, i?, N) be an algebraic structure satisfying axioms [H [21 El IH and O Call this a 
true-false logic under uncertainty. 

Consider first two cases: (1) the degenerate cr-algebra S' = {0,L2}, and (2) the trivial P(y4 | B) G 
{a,b}. In both cases, it follows immediately that the algebraic structure in question is isomorphic 
to conditional probability. Call S' nondegenerate if it contains at least 4 elements, and P nontrivial 
if there exist an A, B such that P(0 | B) < P(y4 | B) < P(L2 | 5), and assume henceforth that S 
and P satisfy these conditions. We can now proceed to deduce the structure of P from the algebraic 
properties of our domain. 

Lemma 2 (Monotonicity). 

The function o dehned in AxiomOis continuous and nondecreasing in both arguments. Furthermore, 
for all P(A I B) ^ P(0 | 5), it is strictly increasing. 

Proof. Consider a sequence 0 C .. C A* C Aj+i C .. C LI. By Axiom [3l for arbitrary B,C consider 

F{AiB \C) =F{B \C)oF{Ai\ BC). (4) 

By Axiom m we get F{AiB \ C) < F{Ai+iB \ C), and F{AiB \ C) Z' P(5 | C). Thus, o is 
continuous in its second argument. Also, if F{B \ C) Z I C), by monotonicity there exists a 
subsequence A* such that 

F{A*B I C) < P(AZi5 I C). (5) 

Hence, for F{B \ C) Z \ C), the function o is strictly increasing in its second argument. Now, 
apply Axiom [S] and consider LI x LI and P(Aj x B \ C x C) where Aj is dehned as above. The axiom 
allows us to write 

P(A, xB\CxC)= P(Ai I C) o P(H I C), (6) 

but since 0 C .. G Ai x B C Aj+i x B G .. G Q x B, following the same logic we get that for 
F{B I C) Z F{0 I C), o is also continuous and strictly increasing in its hrst argument. Similarly, if 
we allow P(i? I C) = P(0 | (7), then by the same argument the function o is nondecreasing in both 
arguments, and the result follows. ■ 

Lemma 3 (Cancellativity). 

For all P(A | B) Z F{0 \ B), the function o dehned in Axiom [3] is cancellative, i.e. x o y = x o z 
implies y = z and x o y = z o y implies x = z. 

Proof. Consider 


F{AB I C) = P(A I C) o F{B 
= F{B I C) o P(A 


and choose A,B,C such that 


P(A I C) = F{B I C) Z P(0 I 

Then since o is strictly increasing, we have 

F{B I AC) =P(A I BC). 
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AC) 


BC) 

(7) 

C). 

(8) 


(9) 


Hence if we let x = F{B \ C), y = P(i? | AC), and 2 ; = P(H | BC), we get that 

xoy = xoz implies y = z (10) 

and analogously, by repeating the construction in Lemma [21 we get 

xoy = zoy implies x = z. (11) 

Since A and C are arbitrary, this holds for all x,y,z ^ P(0 | C) on which o is dehned, and thus o 
is cancellative. ■ 

Lemma 4 (Uniqueness of Identity Element). 

For all A, B, we have that P(r2 | A) = P(r2 \ B) = e and P(0 | A) = P(0 | B), moreover for any x 
we have x o e = x and eo x = x. 


Proof. First, note that 

P(H I C) = P(Hfl I C) = P(fl I C) o F{A I flC) = P(fl I C) o F{A \ C). (12) 

Hence, there exists an e G i? such that 

eo X = X. (13) 

Now, to show uniqueness, suppose there exists a d G i? such that do x = x for all x. Then we can 
write 


do d = d 


and 


eo d = d. 


(14) 


But since these are equal, we have 


do d = e o d 


(15) 


which by Lemma El implies 


d = e 


(16) 


and hence P(L2 \ A) = e = P(r2 | B) for all A, B. By applying Axiom 0] to both sides, it follows that 
P(0 I A) = N{e) = P(0 I B) for all A, B. ■ 


Lemma 5 (Associativity Equation: Constrained Triples). 

The function o dehned in Axiom El satishes 

[x o y) o z = X o {y o z) (17) 

on the set of constrained triples x = P(A | BCD), y = F{B \ CD) and 2 ; = P(C' | D) for arbitrary 
A,B,C,D. 

Proof. Using Axiom El we can write 

F{ABC I D) = F{BC \ D) o F{A \ BCD) 

= [P(C' I D) o F{B I CD)] o F{A \ BCD), (18) 

and 

F{ABC I D) = F{C I D) o F{AB \ CD) 

= F{C I D) o [P(5 I CD) o F{A I BCD)] 

and the result follows for x, y, z on the set of constrained triples. 
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Remark 6. 

We have so far shown that the domain oi [x o y) o z = x o {y o z) must contain triples (x, y, z) 
equal to P(C' | D),¥{B \ CD),¥{A \ BCD), respectively. This set of probabilities is the key idea 
behind the counterexample in Halpern [^, in which a space is constructed where the functional 
equation holds on all triples of the form P(C | D),'S’{B \ CD),'S’{A \ BCD) but not on all triples 
P(C' I D),F{B I D),F{A I D), as is needed for isomorphism. We now invoke Axiom [S] to prevent 
this possibility. 


Lemma 7 (Associativity Equation: Unconstrained Triples). 

The function o dehned in Axiom [3] satishes 



{x 0 y) 0 z = X 0 {y 0 z) 

(20) 

on the set of unconstrained triples x = 
A,B,C,D. 

P(A D), y = F{B D) and = P(C 

D) for arbitrary 

Proof. Consider 

{x 0 y) 0 z = X 0 {y 0 z) 

again. Apply Axiom [3] twice and take 



= (A, 12,12) 

C^ = (12,12,^) 

B^ = (12,5,12) 

D^ = {D,D,D). 

(21) 

Then 

P(A^ 1 B^C^D^) 

= P[(A,12,12) 1 {D,BD,CD)] 

= P(A 1 D) 0 P(12 1 BD) 0 P(12 1 CD) 

= P(A 1 D) 

(22) 


where the last line follows by Lemma 01 Hence, we may choose x,y,z such that x = P(C' | D), 
y = F{B I D) and 2 ; = P(A | D) for arbitrary A, B, C, D, and the result follows. ■ 


Lemma 8 (Associativity Equation: Interval). 

The function o dehned in Axiom [3] satishes 

{x o y) o z = X o {y o z) (23) 

with {x,y,z) G [P(0 | B),F{D \ B)]^. 

Proof. Since P is assumed nontrivial, we may choose C, D such that 

P(0 I D) < P(C' I D) < P(H I D). (24) 

Now, by Axiom [31 consider P(C x C \ D x D). We have that 

F{C X C \ D X D) = F{C \ D)o F{C \ D) 

< P(C' I D)oF{n I D) 

= F{C I D) (25) 

where the inequality follows by Lemma [21 and the last line follows by Lemma 01 Consider the 
sequence 

F{Cf I Df) =F{C X .. X C \ D X .. X D) 
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( 26 ) 


where the Cartesian product is repeated i times. By fl2^ this sequence is monotone decreasing, and 

it is bounded below by P(0 | D), therefore it converges. Suppose that it converges to some value 

5 > P(0 I D), (27) 

and consider 

¥{CxCf \ Dx Df) = ¥{C xC xCf_-^ \ D X D X Df_^). (28) 

We can write 

P(C' I D) o P(C'f \ Df)=W{CxC\DxD)o P(C'f_^ | (29) 

and since the sequence converges 

P(C I D) o 5 = P(C xC \ D X D)o5. (30) 

Since o is cancellative and 5 ^ P(0 | 77), we get 

P(C' I Z7) = P(C' X C I X D) (31) 

which is a contradiction. Therefore, 

6 = ¥{ 0 \D). (32) 

By Axiom m the negation of the probabilities in fl2B]) converges to P(r2 | D). Hence we may choose 
D such that for every e > 0 there exists a C such that 

P(H I 77)-£ < P(C'I 77) < P(H I 77). (33) 

But Axiom O requires that o is closed under composition, and since it is also continuous, we must 
have that o is well-dehned for {x,y,z) G [P(0 | 7?),P(f7 | B)]^, which is the desired result. ■ 

Remark 9. 

Lemma [S] is needed to demonstrate that o satishes the associativity equation on a closed interval 
rather than an arbitrary set. As mentioned in Section O the solutions of functional equations may 
depend heavily on the domain on which they are dehned, and related assumptions. 

Lemma 10 (Product Rule). 

The algebraic structure (72, P, o) is isomorphic to (72, P, x) where x denotes multiplication. 


Proof. By Lemma El we have that o satishes the associativity equation {xoy)oz = xo{yoz) where 
x,y,z are each contained in the closed interval [P(0 | 7?),P(72 | 77)]. Furthermore, by Lemma [2] o is 
continuous, by Lemma El it is cancellative, and by Lemma 0] there exists a unique identity element 
eo X = X. From these assumptions, it is shown in Aczel [H (p. 268) that o must satisfy 

X o y = 9 ~^[gix) + giy)] (34) 


for a continuous strictly increasing function g ; [P(0 | 77),P(72 J77)] —> [P(0 | 77),P(72 | 77)]. The 
function g is shown to be unique up to rescaling in Aczel et ah See also Craigen and Pales 15 


and Paris [38|. We can substitute this into Axiom E] to get 


g [F{AB \C)]=g [P(77 \ C)] + g [P(A | BC)] 
= (7[P(A|C)]+<?[P(77|AC)]. 
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( 35 ) 






Then 


exp [g [¥{AB | C)] } = exp [g [P(5 | C )]} exp [F{A \ BC )]} 

= exp {g [P(A I C)] } exp {g [P(i? | AC)] ] (36) 

and since exp and g are strictly increasing and bounded on closed intervals, the result follows. ■ 
Lemma 11 (Normalization). 

The algebraic structure (LI, P, x, i?) is isomorphic to (LI, P, x, [0,1]). 

Proof. From LemmalU we have for all A, B that P(0 | B) < P(74 | B) < P(r2 | B) and that P(0 | B) 
and P(r2 | B) are equal for all B. Hence, we have that min{P} = P(0 | B) and max{P} = P(r2 | B) 
are well-dehned and hnite, so the isomorphism is given by 

P(H I B) — min{P} 
max{P} — niin{P} 

and the result follows. 



Lemma 12 (Scaling). 

The algebraic structure (H, P, x, [0,1], N) is isomorphic to (H, S', P, x, [0,1], N) with 
7V(l/2) = 1/2. 


Proof. First, note by Axiom [2] that N is continuous and strictly decreasing. By Lemma [TTl N maps 
[0,1] to itself. Then, following Van Horn ^], Brouwer’s Fixed Point Theorem implies that N 
admits a hxed point h such that N{h) = h, which - since N is strictly decreasing - is unique. Since 
P(0 I B) 7 ^ P(r2 I B), we have that 0 < h < 1, therefore there exists an m > 0 such that h"* = 1/2. 
Then P™ gives the required isomorphism. ■ 


Lemma 13 (Sum Rule). 

The algebraic structure (H, P, x, [0,1], N) is isomorphic to (H, P, x, [0,1], 1 — • )• 


Proof. We proceed by establishing a set of expressions N must satisfy. Consider 

O = ¥{0 \B)=N [P(H I B)] = N{1) (38) 

and analogously A^(0) = 1. By Lemma [T21 we may take N{l/2) = 1/2. Suppose that 0 < P{AB \ 
C) < ¥{B I C) < 1. From Lemma [TUI and Axiom 01 we have that 


F{AB^ I C) = P(A I C) ¥{B^ \ AC) = P(A | C) N 
and, following Paris 


P(H I AC) 


P(AH‘’ I C) = P[(A‘’ U H")(AH") I C] 

= N[F{AB I C)] N[F[A^ U H | (A" U B^)C]] . 

where the latter follows by AB'^ C A'’ U H'’ and de Morgan’s law. Now, 


F{AB I C) = P(A I C) F{B \ AC) 


hence 


F{B I AC) = 


F{AB I C) 
P(A I C) ' 


(39) 

(40) 

(41) 
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Similarly, 


¥[A^UB\ {A'^ U B'^)C] = 


P [A'^iA'^ U B^) I C] N [P(A I C)] 


F{A^ UB^\C) N [¥{AB \ C)] 
since A'^ = A^^^A^ U B^). Substituting this into the previous equation, it follows that 


(42) 


¥{A I C)N 


¥{AB I C) 
P(/l I C) 


= iV[P(5 I C')]iV 


N[F{A I C)] 
N[F{AB I C)] 


(43) 


and hence letting x = F{AB \ C) and y = P(i? | C), we have that N must satisfy the functional 
equation 


yN 


X 


= N{x)N 


N{y) 

N{x) 


iV(0) = l iV(l)=0 


iV(l/2) = 1/2 


(44) 


with {) < X < y < 1. By applying Axiom [S] and repeating the argument in Lemma [TUI it can be 
seen that this must hold for aW x < y in (0,1)^. From these conditions, it is shown in Paris 38 
that the solution is N{x) = 1 — x. ■ 

Lemma 14 (Finite Additivity). 

We have that P in (LI, P, x, [0,1], 1 — •) satishes 


P 




2=1 




(45) 


2=1 


for arbitrary disjoint Aj G and any positive integer n. 
Proof. This follows from Lemma US] by induction. 

Lemma 15 (Countable Additivity). 

We have that P in (LI, P, x, [0,1], 1 — •) satisfies 


P 




4 = 1 


= yp(A|B) 


(46) 


2=1 


for arbitrary disjoint A* G ^ 


Proof. Let (Ai, A 2 ,..) be a collection of disjoint sets in Define A* = IJjLi A, and A = A, 
Clearly A* C A*_,_]^ for all positive integers n, with A* y' A, so 


P 




2=1 


= P 


lim A* I B 

n—^oo 


= P(A I B). 


(47) 


On the other hand, by Lemma [H] and Axiom [2] we have 

= Jim P(/i; I B) = P(/l I B), 


2=1 



n 

B) = lim P 

^ n^oc 

\jAi\B 

2=1 


( 48 ) 


which gives the desired result. 
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Theorem 16 (Cox’s Theorem). 

Every true-false logic under uncertainty is isomorphic to conditional probability. 

Proof. To show this, it suffices to check that P(y4 | B) satisfies Kolmogorov’s Axioms up to isomor¬ 
phism for all B. 

(Kl) (Normalization) P(fl \ B) = 1 for all B. 

(K2) (Non-Negativity) P(A | i?) > 0 for all A, B. 

(K3) (Countable Additivity) For all countable collections of disjoint sets A* and all B, we have 

n[jr=iA\B) = EZinA\B). 

By Lemma [TUI Lemma [HI and Lemma [TTl we have that every true-false logic under uncertainty (fl, 
P, o, i?, N) is isomorphic to (fl, P, x, [0,1], 1 — •), which satisfies Kl and K2 immediately, 
and K3 by Lemma [T5l ■ 

Our construction of standard conditional probability per Kolmogorov is now complete. We 
conclude by noting that unconditional probability may immediately be constructed by defining 
P(A) =P(A I O). 


5 Discussion 


5.1. Philosophical implications. Probability may be interpreted as the unique extension of true-false 
logic under uncertainty in a Jaynesian sense, made precise through the axioms in Section[21 Together 
with the frequentist interpretation - via notions of volume, area and repeated events - and the de 
Finetti interpretation via notions of fair betting odds, this gives another fundamental axiomatic 
way to understand exactly what probability theory is. The mathematical beauty is that no matter 
which interpretation is chosen, under appropriate axioms the resulting theory is the same. 

As with the de Finetti interpretation, but unlike the frequentist interpretation, conditional proba¬ 
bility is the primitive concept from a Jaynesian philosophical perspective. Similarly, just as in the 
de Finetti interpretation. Stone’s Representation Theorem allows us to interpret probabilistic 
events as concrete true-false propositions rather than merely abstract sets - indeed, it gives a philo¬ 
sophical reason why a a-algebra should be used rather than some other algebraic structure such as 
a power set. 

This interpretation is of fonndational interest. It offers one explanation for why probability theory 
is so ubiquitous in practice, and why Bayesian representations exist for virtually all approaches to 
inference, such as the regularized maximum likelihood example considered briefly in Section [H in 
which the regnlarization term can be seen as a log-prior probability. Indeed, Jaynes’ framework can 
be used to motivate the Bayesian approach to statistical inference, just as de Finetti’s framework 
has. 


Finally, Cox’s Theorem also has implications for the notion of uncertainty in the context of a 
probabilistic model. One may philosophically choose to split uncertainty into parameter uncertainty 
~ the uncertainty of a parameter given some model, model uncertainty - the uncertainty of a 
model within some space of models, and systemic uncertainty - the uncertainty of the system of 
reasoning being used within the space of possible systems of reasoning. In this context, Cox’s 
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Theorem implies that there is no systemic uncertainty if and only if the system of reasoning used 
is probability. In a given problem, of course, other forms of uncertainty may arise - for instance, 
uncertainty about convergence of computational methods. 


5.2. Mathematical implications. One of our fundamental goals is combining full mathematical rigor 
with maximum generality. Compared to Paris 


38 


our construction does not take as assumption, 
for instance, that o is strictly increasing, or that the range of the function P is [0,1] - in our work, 
these are proven. This makes our result more general than that of other authors in a mathematical 
sense, not just a philosophical one. 


Cox’s Theorem is a representation theorem: it assumes that the assignments P(y4 | B) are self- 
consistent. To prove that a true-false logic under uncertainty exists, we can use the standard result 
that taking hi = [0,1], ^ to be the Borel a-algebra, and A to be the conditional Lebesgue measure 
results in a theory that is not contradictory. As a consequence, Cox’s Theorem says essentially 
nothing about conditioning on sets of measure zero. It is a standard result in probability theory 
that this can be done, but it is nontrivial - see Qinlar for details. 

Some form of continuity is needed for Cox’s Theorem to hold: Paris takes as assumption 
that o and N are continuous functions. Our axiom of Sequential Continuity directly implies this, 
and we would argue that it constitutes a natural way to formalize Jaynes’ desiderata that small 
changes in events should yield small changes in probabilities. The natural ordering it induces also 
allows us to assume virtually no properties about o or N - many other variants of the theorem, for 
instance, assume them to be monotone functions. This assumption also suffices to yield a countably 
additive theory of probability rather than a hnitely additive one - indeed, as noted earlier, a similar 
approach is taken by some authors in the de Finetti system [^. There has been considerable 
historical controversy over the use of Sequential Continuity in that approach. In our view, if we 
must assume continuity for the proof to hold, then choosing our particular dehnition does not seem 
restrictive - though we recognize that others may disagree, and that it is possible our argument 
may apply mutatis mutandis without it. 


Our Consistency under Extension axiom ends up being even more powerful than Paris’ Density 
axiom. It allows probability on hnite sets, by forcing them to behave in a way analogous with inhnite 
sets, thus bypassing the issue that leads to the counterexample in Halpern j^. We hnd it deeply 
interesting that the idea of repeated events - core to the frequentist interpretation of probability - 
suffices to replace assumptions that are much more abstract and philosophically restrictive in our 
context. 


5.3. Relationship with other theories. The assumptions in Section [3] illuminate the relationship be¬ 
tween probability and other theories. In particular, one difference between probability theory and 
Dempster-Shafer Theory [i^ is that uncertainty in the latter is quantihed with intervals rather 
than unique real numbers. Similarly, one fundamental difference between Probability Theory and 
Quantum Theory is that the latter arises when working with an appropriate lattice of a-algebras 
instead of a single a-algebra, and with non-real-valued versions of probability j^. See Goyal et ah 
and Holik et al. j^, 281, who suggest that Cox’s Theorem may be adapted to this purpose - if 


true, this would help understand the philosophical meaning and interpretation of Quantum Theory 
via extensions of Jaynes’ framework. Our formalism offers a way to explore these ideas in future 
work. 
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